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Abstract 

ZeroAccess  is  a  large  sophisticated  botnet  whose  modular  design  allows  new  “modules”  to  be  down¬ 
loaded  on  demand.  Typically  each  module  corresponds  to  a  particular  scam  used  to  monetize  the 
platform.  However,  while  the  structure  and  behavior  of  the  ZeroAccess  platform  is  increasingly  well- 
understood,  the  same  cannot  be  said  about  the  operation  of  these  modules.  In  this  report,  we  fill  in 
some  of  these  gaps  by  analyzing  the  “auto-clicking”  and  “search-hijacking”  modules  that  drive  most  of 
ZeroAccess’s  revenue  creation.  Using  a  combination  of  code  analysis  and  empirical  measurement,  we 
document  the  distinct  command  and  control  protocols  used  by  each  module,  the  infrastructure  they  use, 
and  how  they  operate  to  defraud  online  advertisers. 


1  Introduction 

Botnets  —  large  eolleetions  of  malware  infeeted  eomputers,  eontrolled  by  single  entity  and  working  together 
for  a  eommon  goal  —  have  beeome  the  modern  platform  via  whieh  mass  eybererime  is  waged.  In  partie- 
ular,  botnets  today  are  eentral  to  a  broad  range  of  seams  ineluding  e-mail  spam,  eredentials  theft,  aeeount 
abuse,  denial-of-serviee,  seareh  engine  optimization  and  eliek  fraud.  The  strueture  and  eomposition  of  these 
botnets,  how  they  maintain  eommand  and  eontrol,  and  the  means  by  whieh  they  are  monetized  has  evolved 
eonsiderably  over  the  last  deeade,  with  eaeh  new  generation  offering  additional  resilienee  and  flexibility. 

Today  one  of  the  more  sophistieated  produets  of  this  evolution  is  ZeroAccess.  Aetive  in  a  variety  of 
forms  sinee  2009  [5],  ZeroAeeess  is  one  of  the  largest  botnets  in  operation  today  with  over  1.9  million 
infeeted  eomputers  estimated  to  fill  its  ranks  as  of  August  2013  [17].  ZeroAeeess  is  further  distinguished 
by  being  the  best  known  botnet  primarily  monetized  via  click  fraud  [17],  and  some  researehers  estimate  the 
resultant  eost  to  advertisers  at  over  $2.7  million  USD  per  month  [22]. 
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It  is  this  latter  feature  that  motivates  our  report.  Advertising  drives  mueh  of  today’s  Web  serviees, 
generating  over  $20  billion  in  revenue  in  the  first  half  of  2013  and  growing  at  an  estimated  20%  per  year  [18]. 
However,  this  same  value  has  also  attraeted  eriminal  aetors  who  use  a  variety  of  teehniques  to  generate 
synthetie  advertisement  elieks  to  defraud  advertising  networks  [4].  As  a  result,  this  click  fraud  aeeounts 
for  as  mueh  of  10%  of  all  advertising  elieks,  potentially  defrauding  advertisers  of  hundreds  of  millions  of 
dollars  annually,  with  some  experts  predieting  the  rate  inereasing  by  more  than  50%  per  year  [20]. 

As  one  of  the  largest  eliek  fraud  botnets  in  existenee,  ZeroAeeess’s  operations  are  of  unique  interest  in 
understanding  how  mass  eliek  fraud  eampaigns  are  perpetrated.  Mueh  of  ZeroAeeess  has  been  well  studied, 
ineluding  the  infeetion  veetor  and  the  peer-to-peer  (P2P)  eommand-and-eontrol  (C&C)  protoeol  [5,  17,  22, 
23],  and  several  reports  have  identified  ifs  use  of  eliek  fraud  [17,  22].  However,  we  are  unaware  of  publie 
work  doeumenfing  fhe  eliek  fraud  proeess  in  feehnieal  depfh. 

In  fhis  reporf  we  foeus  on  doeumenfing  fhe  eliek  fraud  behavior  of  fhe  ZeroAeeess  bofnef  and  fhe  in- 
frasfruelure  if  uses  in  pursuif  of  fhis  goal.  We  lake  a  mullifaeeled  approaeh,  ineluding  a  eombinalion  of 
binary  and  nelwork  analysis,  malware  exeeulion,  and  direel  inleraelion  wilh  fhe  bofnef  C&C  servers.  In 
parlieular,  we  deseribe  how  ZeroAeeess  uses  Iwo  differenl  “modules”  lo  earry  oul  dislinel  forms  of  eliek 
fraud:  aufo-elieking  and  seareh-hijaeking. 

Auto-Clicking:  This  ZeroAeeess  module  aulomalieally  elieks  on  adverlisemenfs  sen!  via  fhe  module’s 
C&C.  These  elieks  oeeur  rapidly,  unseen  by  fhe  user  and  independenl  of  any  user  inleraelion.  Seelion  5 
deseribes  fhe  behavior  and  infraslruelure  of  fhe  ZeroAeeess  aufo-elieking  module. 

Search-Hijacking:  This  ZeroAeeess  module  felehes  ads  relating  lo  real  search  queries  generated  by 
fhe  user  on  fhe  infeeled  maehine.  When  fhe  user  elieks  on  a  seareh  resull,  fhe  module  inlereepls  fhe  eliek 
and  insfead  performs  a  separale  fraudulenl  ad  eliek  relaled  lo  lhal  seareh  query.  If  Ihen  redireels  fhe  user  lo 
an  advertiser’s  Web  sile.  Given  fhe  user  inleraelion  and  Ihe  eliek’s  relation  lo  Ihe  seareh  query,  sueh  fraud 
may  lead  lo  advertising  eonversion,^  resulting  in  higher  revenue  for  Ihe  eriminals.  We  diseuss  Ibis  proeess, 
and  Ihe  assoeialed  meehanisms  used  lo  aehieve  il,  in  greater  deplh  in  Seelion  6. 

The  remainder  of  Ibis  reporl  lirsl  explains  Ihe  meehanies  of  Inlernel  advertising  and  eliek  fraud,  Ihe 
history  of  ZeroAeeess,  and  our  measuremenl  melhodology.  We  Ihen  deseribe  Ihe  basie  slruelure  of  Ihe 
ZeroAeeess  malware  dislribulion  platform  and  provide  a  detailed  deseriplion  of  eaeh  of  ils  Iwo  eliek  fraud 
modules.  In  parlieular  we  doeumenl  how  eaeh  eliek  fraud  module  uses  ils  own  C&C  nelwork  (also  dislinel 
from  lhal  used  by  Ihe  ZeroAeeess  platform)  wilh  well  over  a  dozen  server  IP  addresses  impliealed  in  our 
analysis. 

2  Background 

The  ZeroAeeess  bolnel  exploils  Ihe  Inlernel  advertising  eeosyslem  to  profil  by  defrauding  advertisers.  In  Ibis 
seelion  we  frame  Ibis  eeosyslem,  Ihe  general  problem  of  eliek  fraud,  and  Ihe  evolution  of  Ihe  ZeroAeeess 
bolnel. 

2.1  Web  Advertising 

Advertising  on  the  Web  works  in  terms  of  arrangements  between  advertisers,  who  wish  to  display  pro¬ 
motional  eontent,  and  publishers  (sueh  as  blogs,  news  sites,  or  seareh  engines),  who  reeeive  visits  from 
users  who  eould  potentially  view  and  respond  to  that  eontent.  Publishers  reeeive  payment  for  displaying 
the  advertiser’s  eontent,  whieh  ean  eonsist  of  text,  images,  video,  or  other  interaetive  (e.g..  Flash-based  or 

tfn  advertising,  a  conversion  is  a  click  that  leads  to  some  user  interaction  on  the  advertiser’s  Web  page.  What  constitutes  a 
conversion  can  vary  based  on  advertiser.  Such  clicks  are  said  to  be  “high  quality”  because  of  the  user  interaction. 
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JavaScript-based)  media.  Advertisements  generally  include  links  to  the  advertiser’s  site  to  allow  interested 
users  to  directly  engage  with  the  advertiser  by  clicking  on  the  advertisement  and  visiting  the  site. 

In  practice,  advertisers  and  publishers  often  do  not  deal  with  each  other  directly.  Instead,  each  contracts 
with  an  advertising  network  that  coordinates  ad  placement  between  many  advertisers  and  publishers.  In  a 
traditional  arrangement,  an  advertiser  buys  a  given  volume  of  advertising  from  the  ad  network,  usually  also 
specifying  a  set  of  keywords  defining  the  context  in  which  to  show  the  ad.  Publishers  then  join  an  advertising 
network  and  display  ads  provided  by  the  network. 

2.1.1  Pricing  and  Payments 

Advertising  networks  price  ads  in  one  of  three  basic  ways.  For  cost-per-impression  pricing,  the  advertiser 
pays  for  each  end-user  impression-,  essentially,  whenever  a  browser  loads  their  ad  as  part  of  a  Web  page. 
Such  pricing  is  also  termed  cost-per-mille  (CPM),  reflecting  the  common  use  of  a  thousand  impressions 
as  the  usual  unit  of  pricing.  For  cost-per-click  (CPC)  pricing,  advertisers  pay  whenever  a  user  clicks  an 
ad,  presumably  reflecting  potential  interest  on  the  part  of  the  user.  For  cost-per-acquisition  (CPA)  pricing, 
advertisers  pay  when  a  user  converts,  evincing  engagement  with  the  advertiser’s  site  such  as  adding  an 
item  to  their  shopping  cart  or  signing  up  for  a  mailing  list.  CPC  ads  predominate  today  in  terms  of  ad 
revenue  [18]. 

Another  consideration  of  relevance  to  the  mechanics  of  advertising  and  pricing  concerns  whether  the 
publisher  offers  search  or  contextual  placement  for  the  ads.  To  match  ads  to  relevant  publisher  sites,  adver¬ 
tisers  usually  provide  associated  keywords.  Search  ads  are  placed  by  publishers  (such  as  search  engines) 
in  response  to  specific  user  searches  on  fhe  publisher’s  sife,  while  confexfual  ads  reflecl  mafches  fo  words 
presenf  on  a  publisher’s  sife  (such  as  a  blog  posf).  Since  users  have  more  explicif  infenf  when  searching, 
search  ads  fypically  lead  fo  higher  conversions  and  fhus  cosf  more. 

Wifh  cosl-per-click  (CPC),  publishers  usually  fake  a  70%  cuf  of  fhe  cosf  of  fhe  ad,  while  fhe  ofher  30% 
goes  fo  fhe  ad  nefwork.  CPC  can  be  quife  lucrafive  for  publishers,  wifh  prices  for  cerfain  keywords  as  high 
as  $50  per  click  [9]. 

Publishers  may  have  syndicafion  arrangemenfs  wifh  ofher,  smaller  publishers  (for  example,  a  large 
search  engine  syndicafing  fo  a  smaller  search  engine),  where  fhe  publisher  may  choose  fo  provide  ads  from 
fhe  ad  nefwork  fo  fhe  smaller  publisher.  When  such  a  syndicafed  ad  resulfs  in  a  click,  fhe  syndicafing  pub¬ 
lisher  lakes  a  cuf  of  fhe  profil.  Syndicafion  chains  can  be  arbilrarily  long;  e.g.,  publisher  A  may  sub-syndicale 
fo  publisher  B,  who  may  in  furn  sub-syndicale  fo  publisher  C,  and  so  on.  Typically  every  publisher  along 
fhe  chain  earns  money  for  a  click,  fhe  amounl  depending  on  fhe  type  of  syndicafion  agreemenl  (flal-rale  vs. 
a  fraclion  of  fhe  profil). 

2.1.2  Anatomy  of  a  click 

When  an  adverliser  decides  fo  adverfise  wifh  an  ad  nefwork,  fhe  adverliser  provides  fhe  ad  conlenl  fo  display 
fo  users  along  wifh  a  landing  URL  on  fhe  adverliser’s  site  fo  which  clicks  on  fhe  ad  should  lake  users.  When  a 
publisher  decides  fo  use  a  parlicular  ad  nefwork  fo  populate  some  pari  of  ils  Web  page,  fhe  nefwork  provides 
fhe  publisher  wifh  a  code  snippef  fo  felch  ads  from  fhe  nelwork’s  servers.  This  snippef  can  be  embedded  in 
fhe  Web  page  (fypically  using  JavaScripl),  or  if  can  felch  ads  in  fhe  background  lhaf  if  Ihen  formals  before 
displaying  fo  fhe  user. 

Figure  1  diagrams  an  example  inleraclion  belween  a  user  visiling  a  publisher  sife,  an  ad  nefwork,  and  an 
adverliser.  When  a  user  receives  a  requesled  publisher  page  {Steps  1-2),  fhe  JavaScripl  snippef  embedded  in 
fhe  page  conlacfs  fhe  ad  servers  fo  requesl  ads  {Step  3),  idenlifying  fhe  publisher  lhaf  made  fhe  requesl.  In 
response,  fhe  ad  server  decides  which  ads  fo  provide,  and  logs  fhe  requesl,  an  ad  impression  {Step  4).  Nexl, 
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16.  Logs  Conversion 


Figure  1 :  Typical  anatomy  of  an  ad  click,  showing  the  various  HTTP  requests  associated  with  a  user  clicking  on  an 
advertisement,  leading  them  to  an  advertiser’s  landing  page,  and  from  there  possibly  to  additional  interactions. 


the  ad  server  returns  to  the  user’s  browser  the  ad  content  along  with  a  unique  ad  URL  {Step  5).  The  ad  URL 
contains  an  identifier  used  for  linking  it  to  the  advertiser. 

If  the  user  chooses  to  click  on  the  ad  {Step  6),  the  browser  makes  an  HTTP  request  for  fetching  the  ad 
URL  from  the  ad  server.  At  this  point,  the  ad  server  logs  an  ad  click  {Step  7)  and,  if  using  the  predominant 
CPC  pricing,  charges  the  advertiser  associated  with  that  ad.  After  logging  the  click,  the  server  redirects  the 
user’s  browser  to  the  advertiser’s  site,  typically  using  an  HTTP  redirect  response  code  {Steps  9-11),  though 
other  mechanisms  are  possible. 

After  landing  at  the  advertiser  Web  site,  the  user  may  choose  to  perform  an  action  desirable  to  the  adver¬ 
tiser  {Step  12),  a  conversion.  The  advertiser  decides  what  constitutes  a  conversion,  in  general  choosing  ac¬ 
tions  that  represent  a  tangible  return  on  investment.  Most  ad  networks  provide  optional  conversion-tracking 
by  having  the  advertiser  embed  a  single-pixel  image  hosted  by  the  ad  network  on  the  desirable  page,  which 
enables  linking  together  the  click  and  the  conversion  events  based  on  a  cookie  value  {Step  16). 

2.2  Click  Fraud 

In  the  context  of  CPC  ads,  click  fraud  is  the  practice  of  fraudulently  generating  clicks  on  CPC  ads  without 
any  intention  of  fruitfully  interacting  with  the  advertiser’s  site.  As  a  result,  advertisers  lose  money,  receiving 
no  return  on  their  investment. 

There  are  two  primary  motivations  for  click  fraud.  First,  a  malicious  advertiser  can  employ  click  fraud 
targeting  a  competitor’s  ad  to  deplete  their  advertising  budget  [21].  A  stronger  motivation,  however,  lies 
with  publishers,  who  directly  profit  from  ads  clicked  on  their  site.  Ad  networks  also  profit  from  click  fraud, 
though  reputable  ad  networks  that  want  to  maintain  long  term  relationships  with  advertisers  will  presumably 
attempt  to  identify  and  weed  out  click  fraud  activity.  Note  that  it  is  hard  for  an  ad  network  to  prove  that  a 
particular  click  was  fraudulent  because  it  is  equivalent  to  guessing  the  intent  of  the  user  behind  the  click. 
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An  attacker  can  perform  click  fraud  in  various  ways.  Simple  approaches  involve  hiring  people  to  click  on 
ads  (termed  click  fraud  farms)  [6]  or  running  stand-alone  scripts  that  repeatedly  retrieve  the  URLs  associated 
with  ads,  simulating  user  clicks  (stand-alone  click  fraud  hots)  [13].  As  ad  networks’  defenses  have  evolved, 
so  have  the  means  of  click  fraud.  The  more  complex  approaches  includes  search  engine  hijacking  [19], 
where  a  malicious  in-browser  plugin  replaces  ad  links  in  results  returned  for  user  searches  by  other  ads, 
confusing  a  user  into  clicking;  and  the  use  of  click  fraud  botnets  [16,  22],  i.e.,  groups  of  malware-infected 
hosts  that  fetch  and  click  ads  unbeknownst  to  the  user. 

Of  these,  botnets  can  pose  a  very  difficult  case  for  an  ad  network  to  detect  due  to  the  large  geographic  dis¬ 
tribution  from  which  the  fraudulent  clicks  appear,  which  combined  with  a  low  click  fraud  intensity  can  make 
the  activity  seemingly  indistinguishable  from  genuine  user  traffic  [3].  One  facfic  fhaf  many  ad  nef works  em¬ 
ploy,  smart  pricing  [7],  discounfs  clicks  from  publishers  fhaf  subsequenfly  do  nol  lead  fo  conversions,  based 
on  fhe  assumption  fhaf  many  forms  of  click  fraud  can  force  clicks  buf  have  difficully  producing  conversions. 

2.3  ZeroAccess 

ZeroAccess  is  a  complex  bofnef  fhaf  has  undergone  several  sfages  of  evolufion,  which  we  recounf  here. 
Allhough  firsl  described  solely  as  a  “roofkil,”  ZeroAccess  has  developed  info  a  vasl  peer-to-peer  (P2P) 
bofnef  and  malware  delivery  platform. 

2.3.1  Early  Life,  2009-2011 

Inifial  reporfs  of  fhe  “ZeroAccess  roofkil”  dale  lo  2009  [5].  In  2010,  fhe  InfoSec  Inslilule’s  delailed  analysis 
described  if  as  a  “plalform  lo  deliver  malicious  soflware”  [2].  Al  Ibis  slage,  fhe  main  malware  delivered 
using  ZeroAccess  was  “FakeAV”,^  wilh  an  eslimaled  250,000  compulers  infecled. 

2.3.2  First  generation  P2P  Botnet,  2011-Present 

In  May  2011  a  radically  new  version  of  ZeroAccess  emerged  [17].  In  this  iteration,  ZeroAccess  retained 
its  kernel-mode  rootkit  components,  but  changed  both  its  communication  and  monetization  strategies.  This 
version  spread  itself  via  exploit  packs  (e.g.,  BlackHole  [8])  and  social  engineering  [14,  23]. 

The  defining  feature  of  this  iteration  of  the  botnet  was  the  introduction  of  a  decentralized,  TCP-based 
P2P  communication  protocol.  The  protocol  used  cryptography  and  obfuscation  as  well  as  other  common 
P2P  features  such  as  “super  nodes”  that  served  to  orchestrate  large  portions  of  the  network’s  activity.  The 
network  allowed  the  botmasters  to  maintain  decentralized  control  while  relaying  commands  and  payloads 
to  infected  computers  worldwide  [22]. 

The  P2P  protocol  included  cryptographic  signing  of  malicious  payloads,  which  hardened  the  botnet 
against  attempted  hijacking  by  preventing  untrustworthy  peers  in  the  botnet  from  successfully  delivering 
payloads  other  than  those  cryptographically  signed  by  the  actual  botmaster  [22]. 

In  addition,  the  monetization  strategy  changed  with  this  generation.  ZeroAccess  moved  away  from 
FakeAV  payloads  and  instead  began  distributing  Bitcoin  miners  and  click  fraud  modules.^  From  a  technical 
perspective,  the  primary  click  fraud  malware  used  in  this  era  operated  in  the  indiscriminate  “auto-clicking” 
fashion  we  describe  in  Section  5. 

Alongside  the  click  fraud  and  Bitcoin  payloads,  ZeroAccess  itself  was  also  sold  as  a  service  on  under¬ 
ground  forums  [17],  enabling  cyber-criminals  to  use  the  ZeroAccess  rootkit  to  distribute  their  own  malicious 
payloads. 

^FakeAV  is  malware  that  claims  to  be  anti-virus  software  to  extort  users  into  paying  money  to  remove  fictitious  infections. 
^ZeroAccess’s  shift  away  from  FakeAV  occurred  just  before  a  major  takedown  that  resulted  in  the  closure  of  most  FakeAV 
programs  [11]. 
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MD5 


Last  Obtained 


Auto-Clicking  module 

51ba6261e44c60b2f891fabfaa47d0ad 
Search-Hijacking  module 

7128a957f5c9c9a69385f5332ca6338c 
3aecl03d38c7520229el8af 260c5a00d 
36616e8f309b35f8e090068690272239 
8fa08c59e4d205e514f8a978678ba798 


Nov.  22,  2013 

Nov.  22,  2013 
Sep.  26, 2013 
June  14,  2013 
May  30,  2013 


Table  1:  Auto-Clicking  fraud  and  Search-Hijacking  module  executables  used  in  the  analysis. 


This  iteration  of  the  botnet  also  saw  an  inerease  in  botnet  population.  At  the  height  of  infeetions  in 
early  2012,  estimates  plaeed  the  botnet  population  at  over  500,000  [15].  Despite  the  age  of  the  botnet 
and  its  subsequent  evolution,  as  of  August  2013  there  were  still  over  30,000  eomputers  infeeted  with  this 
generation  [17]. 

2.3.3  Second  generation  P2P  Botnet,  2012-Present 

In  July  2012,  ZeroAeeess  evolved  into  the  form  predominant  as  of  November  2013.  Aeeording  to  Syman- 
tee,  by  August  2013,  this  generation  had  an  estimated  population  of  over  1.9  million  eomputers  [17].  This 
iteration  ineluded  several  ehanges  to  the  malware  strueture,  the  protoeol,  and  the  payloads.  The  most  distin¬ 
guishing  ehange  to  ZeroAeeess  in  this  era  was  a  move  away  from  the  kernel-mode  rootkit  eomponent,  with 
all  of  its  funetionality  now  replieated  in  user-spaee  [17].  Other  ehanges  inelude  a  move  to  UDP  from  TCP 
for  the  P2P  protoeol  (likely  to  improve  network  performanee),  and  minor  ehanges  in  the  protoeol  itself. 

The  monetization  strategy  also  evolved.  This  version  saw  the  introduetion  and  massive  distribution  of  a 
new  eliek  fraud  payload  performing  search-hijacking  ,  whieh  we  diseuss  in  Seetion  6. 

3  Methodology 

In  this  seetion  we  deseribe  our  malware  exeeution  environment,  manual  analysis  teehniques,  and  the  Zero¬ 
Aeeess  modules  we  examine. 

3.1  Collecting  Module  Sample  Executables 

Table  1  lists  the  modules  used  in  our  analysis  of  the  eliek  fraud  modules.  We  obtained  our  samples  of 
ZeroAeeess  by  searehing  malware  repositories  for  traffie  patterns  eonsistent  with  ZeroAeeess  eommand- 
and-eontrol  (C&C)  behavior  and  exeeuting  eaeh  binary  in  our  exeeution  environment  (Seetion  3.2).  We  have 
identified  thousands  of  binaries  with  ZeroAeeess  C&C  behavior  found  in  Oetober  and  November,  2013. 

During  exeeution,  ZeroAeeess  retrieves  the  auto-elieking  and  seareh-hijaeking  modules  for  exeeution. 
While  ZeroAeeess  transfers  modules  in  an  enerypted  form,  using  reverse  engineered  deeryption  routines  [22] 
we  built  a  tool  that  automatieally  extraets  modules  from  network  traees. 

3.2  Monitored  Execution  Environment 

We  exeeute  eaeh  binary  in  a  virtualized  environment  provided  by  the  GQ  honeyfarm  [12],  whieh  supports 
monitoring  malware  exeeution  while  providing  a  flexible  network  eontainment  poliey.  We  use  Windows  XP 
Serviee  Paek  3  for  all  exeeutions.  The  system  ean  proeess  thousands  of  binaries  per  day. 
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In  our  experiments  the  execution  environment  allows  ZeroAccess  C&C  P2P  (UDP)  traffic.  For  all 
executions  we  forward  HTTP  traffic  to  the  intended  destination  and  redirect  all  other  non-C&C  TCP  traffic 
to  internal  sinks.  For  DNS,  our  service  can  answer  all  queries,  even  requests  without  a  valid  answer  or 
directed  at  external  DNS  servers.  This  feature  ensures  that  domain  takedowns  during  our  analysis  have 
limited  impact  on  malware  execution.  The  configurable  nature  of  the  DNS  server  behavior  enables  us  to  test 
ZeroAccess  samples  with  and  without  DNS  resolution.  For  all  other  protocol  types  we  provide  a  sink  that 
will  accept  packets  but  does  not  respond. 

In  addition  to  network  monitoring,  our  system  collects  operating  system  events,  including  process  cre¬ 
ation,  file  modifications,  and  registry  changes. 

3.3  Binary  Analysis 

For  static  binary  analysis  we  use  IDA  Pro  6.4  with  the  Hexrays  decompiler.^  ZeroAccess  distributes  mod¬ 
ules  as  standard  Windows  DLLs,  a  file  format  natively  supported  by  IDA  such  that  it  can  disassemble  and 
decompile  the  modules  with  Hexrays.  We  use  static  binary  analysis  to  obtain  the  encryption  (and  decryption) 
algorithms,  domains,  and  other  C&C  protocol  information  for  the  ZeroAccess  modules. 

3.4  Milking 

A  milker  is  a  program  that  speaks  a  particular  botnet’s  C&C  protocol  and  mimics  the  communications  of  that 
malware.  Through  the  use  of  a  milker,  we  can  query  information  and  commands  at  a  much  larger  scale,  with 
finer  granularity,  and  across  more  diverse  geographic  regions,  than  with  traditional  malware  executions.  This 
technique  also  allows  us  to  probe  specific  protocol  behaviors  in  a  way  that  directly  executing  the  malware 
might  not  manifest,  and  to  obtain  C&C  commands  without  potentially  dangerous  malware  side  effects. 

For  this  work  we  created  a  milker  for  the  ZeroAccess  search-hijacking  module’s  C&C  protocol  and  used 
it  to  interact  with  the  module’s  C&C  servers.  The  milker  queries  the  C&C  server  and  retrieves  a  list  of  ads 
to  click  on,  then  simulates  a  click  on  one  of  the  results  using  a  headless  Web  browser.  The  browser  follows 
redirects,  executes  JavaScript,  and  in  general  is  designed  to  perform  similarly  to  a  victim’s  browser.  We  ran 
our  milker  over  several  days  and  describe  some  of  the  data  gathered  in  detail  in  Section  6. 

4  The  ZeroAccess  Platform 

In  this  section  we  describe  the  base  ZeroAccess  platform:  the  botnet  software  responsible  for  coordinating 
communication  among  very  large  numbers  (millions)  of  infected  computers  around  the  world. 

4.1  Infection 

The  first  step  in  a  ZeroAccess  victim’s  lifecycle  is  becoming  infected  with  ZeroAccess.  Like  many  other 
malware  families,  ZeroAccess  is  distributed  in  a  variety  of  ways,  such  as  drive-by  download,  social  en¬ 
gineering,  and  pirated  software  [14].  Each  distribution  vector  results  in  the  installation  of  software  that 
participates  in  the  ZeroAccess  C&C. 

An  example  of  this  process  we  have  observed  begins  when  a  victim  browses  the  Web  and  inadvertently 
visits  a  compromised  Web  site  hosting  an  exploit  kit.  The  exploit  kit  detects  the  victim’s  browser  version  and 
delivers  an  exploit  payload.  The  payload  has  two  functions:  1)  exploit  a  browser  vulnerability,  and  2)  deliver 
malware.  Upon  successful  browser  exploitation,  the  payload  downloads  and  executes  a  ZeroAccess  binary. 
Once  executed,  ZeroAccess  has  control  of  the  victim’s  computer  and  begins  to  communicate  using  the  P2P 
C&C  protocol. 

^https : //www. hex- rays . com/products/ ida/ index . shtml 
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4.2  Command  and  Control 


The  ZeroAccess  platform  uses  a  P2P  protoeol  for  its  C&C,  with  the  primary  funetion  of  distributing  modules 
and  performing  updates.  The  P2P  protoeol,  deseribed  in  greater  detail  in  other  reports  [17,  22],  supports  the 
promotion  of  a  member  to  a  super  node.  As  deseribed  by  Neville  et  al,  super  nodes  store  ZeroAeeess 
modules  and  provide  them  to  other  nodes  upon  request  [17].  In  addition  to  distributing  modules  to  newly 
infeeted  hosts,  super  nodes  also  host  new  versions  of  modules  when  updated.  It  is  important  to  note  that 
aside  from  distributing  the  eliek  fraud  modules,  the  P2P  C&C  protoeol  does  not  play  a  role  in  the  exeeution 
of  eliek  fraud. 

Onee  a  vietim  has  been  infeeted  with  ZeroAeeess,  it  begins  by  bootstrapping  the  P2P  protoeol  using  a 
peer  list  embedded  in  the  binary  [17].  The  P2P  protoeol  diseovers  new  peers,  updates  the  peer  list,  and  adds 
itself  to  the  peer  list  for  new  nodes  to  eontaet.  Onee  the  newly  infeeted  maehine  joins  the  ZeroAeeess  P2P 
network,  it  begins  to  download  modules  as  instrueted  by  other  peers  in  the  network.  A  super  node  hosts  the 
modules,  and  the  malware  issues  a  download  request  to  feteh  and  then  exeeute  the  module.  This  proeess 
oeeurs  shortly  after  infeetion  and  results  in  a  eliek  fraud  module  download.  Onee  that  module  exeeutes,  the 
vietim’s  eomputer  begins  to  earry  out  eliek  fraud  using  a  separate  C&C  protoeol  as  deseribed  in  Seetions  5 
and  6.  When  an  update  to  the  eliek  fraud  modules  beeomes  available  on  the  P2P  network,  the  vietim  learns 
of  the  update  from  one  of  its  peers  and  eontaets  a  super  node  to  retrieve  the  latest  version. 

5  The  Auto-Clicking  Module 

The  ZeroAeeess  auto-elieking  module  performs  eliek  fraud  by  simulating  normal  Web  browser  behavior  of 
a  user  elieking  on  a  Web  advertisement.  This  aetivity  requires  no  user  partieipation,  and  is  not  visible  to 
the  user.^  We  present  the  behavorial  analysis  of  this  module,  treating  the  module  itself  as  a  blaek  box  and 
observing  its  eontained  exeeution.  We  observed  it  performing  about  one  eliek  every  two  minutes.  To  evade 
deteetion,  these  elieks  were  spread  aeross  multiple  ad  networks. 

5.1  Behavior 

The  funetion  of  the  auto-elieking  module  is  to  simulate  a  user  “eliek”  on  an  advertisement.  Figure  2  shows 
the  operation  of  the  module.  The  module  begins  by  eontaeting  one  of  its  eommand-and-eontrol  (CF-C&C) 
servers  with  a  request  for  eliek  fraud  jobs  (Step  O).  The  C&C  server  returns  a  serambled  payload  eontaining 
a  list  of  “eliek”  jobs.  Eaeh  job  is  identified  by  a  host  name,  a  first  hop  URL  and  the  HTTP  Ref  erer^  URL. 
The  module  then  issues  an  HTTP  request  to  another  CL-C&C  server,  setting  the  Host  header  value  as 
speeified  by  the  job.  This  server  then  redireets  the  request  via  an  HTTP  303  redireet  to  a  URL,  the  same 
URL  as  in  the  job  (Step  @).  This  forms  the  first  hop  in  the  redireetion  ehain.  The  module  then  retrieves  the 
URL  to  whieh  it  is  redireeted,  setting  the  Ref  erer  header  as  given  in  the  job  (Step  ©).  The  redireet  ehain 
eontinues  normally  from  this  point  on,  and  after  a  series  of  redireets  the  hot  fetehes  the  ad  URL,  at  whieh 
time  the  advertiser  gets  eharged  (Step  O). 

There  is  no  visible  aetivity  in  the  foreground,  so  it  is  diffieult  for  a  user  to  deteet  that  their  eomputer  is 
performing  eliek  fraud  in  the  baekground.  This  same  proeess  repeats  multiple  times. 

^Some  users,  however,  may  be  alerted  to  the  presence  of  this  module  by  the  increased  network  activity;  in  one  instance,  we 
observed  about  50  MB  of  network  traffic  per  hour. 

®An  HTTP  Ref  erer  header  provides  the  server  with  the  URL  of  the  referring  page,  that  is,  the  page  that  purportedy  contained 
the  URL  being  requested. 


Browser  (background) 


Module 


No  user-visible 
activity 


(request)  O 


(scrambled  payload) 


HTTP  GET© 


HTTP  303  (redirect) 


HTTP  GET  0 


CF-C&C 

/ - s, 


First  hop 
server 


Chain  of  HTTP  and 
JavaScript  redirects  ieading 
to  advertiser  site 


HTTP  GET  O 


HTTP  200 


Figure  2:  Behavior  of  the  auto-clicking  module.  The  module  begins  by  retrieving  a  list  of  “click”  jobs  from  its 
C&C  server  (Step  O)-  For  each  job,  it  uses  the  system  browser  to  retrieve  the  URL  (Step  @)  and  follows  HTTP  and 
JavaScript  redirects  (Steps  ©  and  ©)■ 


5.2  Command  and  Control 

This  section  describes  how  the  hot  communicates  with  the  CF-C&C  servers  to  fetch  click  jobs.  Table  2  lists 
the  IP  addresses  of  the  CF-C&C  servers  that  we  observed  the  hot  contacting  for  fetching  commands.  Note 
that  these  change  over  time. 

The  hot  contacts  one  of  the  CF-C&C  servers  over  TCP  port  12757,  and  sends  an  obfuscated  message 
(message  string  XORed  by  0x72)  identifying  the  browser  User  Agent  string.  In  response,  the  CF-C&C 
server  sends  a  response  (also  obfuscated)  that  contains  the  following:  a  domain  name,  list  of  first  hop  URLs 
to  be  contacted  and  a  set  of  Ref  erer  headers  to  be  set. 

After  receiving  the  first  hop  URLs,  the  auto-clicking  module  does  not  fetch  the  first  hop  URLs  directly. 
Instead,  it  first  contacts  one  of  the  other  CF-C&C  servers  over  HTTP  port  80,  with  the  Host  header  set  to 
the  domain  name  provided  earlier.  Presumably  this  is  an  authentication  mechanism;  earlier  versions  of  the 


IP  Address  Observed  Date  Location 


94.242.195.162 

94.242.195.163 

94.242.195.164 
81.17.18.18 
81.17.26.189 
46.19.137.19 


21  Nov  2013 
21  Nov  2013 
21  Nov  2013 
21  Nov  2013 
21  Nov  2013 
21  Nov  2013 


Luxembourg 

Luxembourg 

Luxembourg 

Switzerland 

Switzerland 

Switzerland 


Table  2:  IP  addresses  of  C&C  servers  observed  for  tbe  auto-clicking  module  in  Table  1.  Tbe  Obeserved  Date  gives 
tbe  date  on  wbicb  we  observed  communication  between  tbe  module  and  these  servers.  Location  is  based  on  MAX- 
MIND  [1]  GeoIP  service. 
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Hop 

URL 

Status  Code 

Notes 

1 

http://46.19.137.19  ... 

303 

Hosf  name  sef  fo  evmrpznw.em 

2 

http: //1556987547 . traf f iliator . com . . . 

302 

Referer  spoofed 

3 

http : //unlimiclick . com/bd . . . 

200 

4 

http : //ads . clicksor . cn . . . 

200 

5 

http : //poomedia . com/ ad . . . 

200 

Loaded  in  iframe 

6 

http : //us . ad2mi . com . . . 

302 

Loaded  in  1x1  pixel  iframe 

7 

http : //searchists . com/ search . . . 

200 

JavaSeripf  redireef 

8 

http : // searchists . com/click/ . . . 

302 

9 

http : // click . local . com . . . 

302 

10 

http: // 1389 . r .msn. com  . . . 

302 

Ad  URL  feleh,  adverfiser  eharged 

11 

Adverfiser 

200 

Table  3:  Example  redirection  chain  corresponding  to  an  auto-clicking  module  “click”  from  November  21,  2013. 


module  exhibited  similar  behavior  of  not  fetehing  the  URLs  direetly  [22].  In  response,  the  seeond  CF-C&C 
server  sends  an  HTTP  303  response  eode,  redireeting  the  browser  to  one  of  the  URLs  in  the  list.  At  this 
point,  the  auto-elieking  module  inserts  a  supplied  Ref  erer  header  and  a  eliek  ehain  begins.  After  a  series 
of  redireets,  the  ad  URL  gets  fetehed,  resulting  in  the  advertiser  getting  defrauded. 

5.3  Example  redirect  chain 

Table  3  shows  a  redireet  ehain  generated  by  the  auto-elieking  module,  starting  from  when  the  hot  eon- 
taets  the  C&C  server  to  authentieate  itself  by  setting  the  Host  header.  In  response,  the  CF-C&C  server 
at  46.19. 137. 19  redireets  the  hot  to  1556987547.traffiliator.com.  This  URL  was  present  in  the 
original  eliek  fraud  job  list,  and  the  hot  now  inserts  the  eorresponding  referrer  before  fetehing  the  URL, 
whieh  eventually  eauses  a  banner  to  be  fetehed  in  an  iframe  from  poomedia.  com.  In  addition  to  the  ban¬ 
ner,  poomedia.  com  also  populates  the  banner  iframe  with  another  1x1  pixel  iframe.  This  seeond  iframe 
is  loaded  with  ad  URLs  and  JavaSeript  eode  that  automatieally  fetehes  one  of  the  links  at  random.  This 
step  results  in  an  ad  eliek,  whieh  is  eventually  redireeted  through  other  publishers  to  the  ad  network,  and 
eventually  to  an  advertiser. 

5.4  Entities 

Depending  on  the  syndieation  arrangement  between  different  parties  in  a  redireetion  ehain,  all  of  them  stand 
to  gain  from  a  fraudulent  eliek  that  an  advertiser  pays  for,  and  thus  any  one  of  the  publishers  may  be  working 
with  the  botnet.  Over  time,  from  our  observations  and  others  [22],  different  versions  of  this  module  have 
been  seen  to  defraud  all  major  CPC  ad  networks,  ineluding  AdCenter,  AdWords,  VSeareh,  affinity,  and 
adsimilate.  We  speeulate  that  this  module  evades  ad  network  deteetion  by  spreading  eliek  fraud  aeross  a 
large  number  of  ad  networks  to  hide  the  high  volume  of  eliek  fraud  performed. 

Sinee  this  eliek  fraud  is  invisible  to  the  end  user  (unlike  the  seareh-hijaeking  eliek  fraud  module),  the 
user  is  unlikely  to  eonvert.  Given  that,  the  use  of  smart  pricing  (ef.  Seetion  2.2)  should  in  theory  diseount 
sueh  malware-driven  elieks.  However,  beeause  of  the  large  number  of  hops  in  the  syndieation  ehain  and 
the  JavaSeript  redireets  that  hide  the  true  length  or  souree  of  the  origin  of  the  traffie,  it  beeomes  extremely 
diffieult  for  an  ad  network  to  identify  that  the  traffie  is  being  driven  by  a  malware  souree,  as  fhe  eliek  fraud 
Iraffie  mixes  in  wifh  ofher  legifimafe  (and  eonverfing)  Iraffie  from  ifs  known  syndieafors,  fhus  undermining 
fhe  use  of  smarf-prieing. 
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engine 


SH-C&C 


Intended 

server 


Figure  3:  Behavior  of  the  search-hijacking  module.  Step  O  :  A  user  enters  a  term  into  a  search  engine.  In  this  example, 
the  user  searches  for  “bike”.  Step  @  ;  The  user’s  browser  performs  an  HTTP  GET  for  that  term.  In  response,  the  user  is 
presented  with  the  unaltered  search  result  from  the  search  provider.  Step  ©  ;  In  parallel  to  the  user  search,  ZeroAccess 
sends  the  search  term  (“bike”)  to  the  ZeroAccess  SH-C&C  server.  Step  O  :  The  user  clicks  on  a  search  result  on  the 
unaltered  results  page.  Step  ©  :  ZeroAccess  intercepts  the  click.  Rather  than  going  to  the  intended  click  destination, 
the  user  is  sent  to  one  of  the  ad  URLs  supplied  by  the  ZeroAccess  SH-C&C  in  Step  ©.  Step  ©  :  The  user’s  browser 
displays  the  result  of  the  ad  click,  an  advertising  landing  page  related  to  their  original  query  (“bike”). 


6  The  Search-Hijacking  Module 

The  search-hijacking  module  interposes  on  a  user’s  normal  interaction  with  various  search  engines  in  order 
to  redirect  the  user  to  an  advertisement  that  generates  the  botmaster  revenue.  Such  search-hijacking  rep¬ 
resents  a  more  sophisticated  type  of  click  fraud.  Whereas  the  auto-clicking  module  simulates  a  real  user, 
the  search-hijacking  sends  a  real  user  to  the  advertiser.  Because  the  advertiser’s  site  is  relevant  to  the  user’s 
search  query,  the  user  may  in  fact  interact  with  the  advertiser’s  site  and  trigger  a  conversion,  as  described  in 
Section  2.  We  describe  the  module’s  search-hijacking  behavior  in  more  detail  next,  and  then  report  on  our 
analysis  of  this  module. 

6.1  Behavior 

Once  loaded,  the  module  monitors  the  the  interaction  between  the  user  on  an  infected  PC  and  the  browser, 
waiting  for  the  user  to  issue  a  search  query  to  a  search  engine  (Step  O  in  Figure  3).  We  have  confirmed 
that  the  module  recognizes  and  hijacks  Web  searches  performed  using  Google,  Bing,  Yahoo,  Ask,  and  ICQ 
Search.  The  module  captures  the  query  terms,  while  allowing  the  query  to  go  through  to  the  intended  search 
engine  (Step  @).  At  the  same  time,  the  query  terms  are  sent  to  the  search-hijacking  module’s  C&C  (SH- 
C&C)  server  to  retrieve  a  list  of  ad  URLs  to  be  used  for  later  hijacking  (Step  @).  When  the  user  clicks 
on  a  search  or  ad  result  (Step  O),  the  normal  click  is  hijacked  and  the  intended  URL  is  replaced  with  the 
replacement  ad  URL  retrieved  from  the  SH-C&C  server  (Step  ©).  The  browser  then  fetches  and  renders  the 
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Hop 

URL 

Status  Code 

1 

217.23.3.223/... 

302 

2 

http : //feed . hype-ads . com/. . . 

302 

3 

http : // search . f reshcouponcode . com/search . php. . . 

200 

4 

http : //c . f reshcouponcode . com/redrct .php. . . 

200 

5 

http : //c . f reshcouponcode . com/ click. php. . . 

301 

6 

http : //nn. xdirectx. com/ clicklink.php. . . 

302 

7 

http : //2478799 . r .msn. com/. . . 

302 

8 

Advertiser 

200 

Table  4:  Example  of  a  redirect  chain  corresponding  to  a  “click”  issued  by  the  search-hijacking  module  on  November 
14,  2013.  Non-final  hops  with  200-level  status  codes  trigger  Javascript  or  Flash-type  redirects. 


replacement  URL  instead  of  the  intended  search  result  URL  (Step  @).  Although  not  shown  in  the  figure, 
retrieving  the  replacement  URL  may  involve  a  chain  of  HTTP  and  JavaScript  redirects.  Table  4  gives  an 
example  redirect  chain  of  a  click  issued  by  the  module. 

The  replacement  and  redirection  process  operates  invisibly  to  the  user.  An  unsuspecting  user  will  believe 
that  the  resulting  page  corresponds  to  the  search  result  or  ad  on  which  the  user  clicked  on  the  search  result 
page.  It  is  important  to  note  that  neither  the  advertiser  nor  the  ad  network  used  in  the  hijacked  click  may  be 
aware  that  search-hijacking  took  place.  From  their  point  of  view,  a  hijacked  user  appears  no  different  from 
a  user  arriving  via  normal  search  syndication. 

Unlike  traditional  click  fraud,  such  search- hijacking  actually  delivers  a  user  to  the  advertiser.  Such  a 
user  may  interact  with  the  advertiser’s  site  and  even  convert,  as  described  in  Section  2.1.2.  Because  some 
fraction  of  the  users  will  convert,  smart  pricing  may  treat  traffic  from  fhe  affiliafe  engaged  in  search-hijacking 
as  legifimafe  and  pay  for  each  click. 

6.2  Command  and  Control 

The  search-hijacking  module’s  command- and-confrol  (SH-C&C)  profocol  uses  HTTP  wifh  a  hard-coded  sef 
of  server  addresses.  We  now  defail  fhe  differenl  elemenfs  of  fhis  profocol. 

6.2.1  Commands 

The  primary  purpose  of  fhe  SH-C&C  channel  is  fo  refrieve  a  lisf  of  replacemenf  URLs  fo  which  fhe  user  will 
be  redirecfed  when  clicking  on  a  query  resulf.  When  fhe  user  performs  a  search  engine  query,  fhe  module 
makes  an  HTTP  GET  requesf  fo  a  SH-C&C  server  (Step  ®  in  Figure  3).  The  HTTP  GET  requesf  siring  is 
formed  as  shown  in  Figure  4.  The  user  search  lerms,  logelher  wifh  addilional  parameters  is  firsl  formalled 
using  slandard  URE  parameter  encoding,  using  fhe  prinlf-slyle  formal  siring: 

v=5 . 4&id=7o08x&aid=°/oU&sid=7oU&q=7o .  *s&eng=7o .  *s&os=7oS&br=7oS&s=7oU 

The  parameler  siring  is  Ihen  encoded  using  Base64  encoding  and  padded  wilh  13  randomly-generated^ 
characters  al  Ihe  fronl  and  10  similarly-generated  characters  al  Ihe  end.  The  lenglh  of  Ihe  padding  is  such 
lhal  a  Irivial  decoding  of  Ihe  entire  siring  does  nol  reveal  Ihe  conlenls  of  Ihe  message.  The  resulting  siring  is 
Ihen  senl  lo  Ihe  SH-C&C  server  in  Ihe  HTTP  GET  request 

^The  malware  generates  the  padding  characters  using  the  Windows  random  number  API. 
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IP  Address 

v= 

Pseudo-Domain 

Purpose 

195.3.145.108 

5.4 

dclixvf pttrlcnindvrnyeic . com 

Search  request 

5.4 

evtrdtikvzwpscvrxpr . com 

5.3 

atenrqqtf rzozqrqbdzwkxzyuc . com 

83.133.120.186 

5.4 

gozapirimagbclxbwin.  com 

Search  request 

5.4 

nbqkgysciuuhadgpjfquvpu. com 

5.3 

cjelaglawfoyidgyapv. com 

83.133.120.187 

5.4 

jpciukjdkqxgreoikpgya. com 

Search  request 

5.4 

qhdsxosxtvmhurwezsipzq. com 

5.3 

omakfdwkhrpqudxvapy . com  * 

217.23.3.225 

5.4 

hzhr jmeeczcgxodmqyz . com 

Search  request 

5.4 

fnyxzjeqxzdpeocarhljdmyjk. com 

5.3 

sqdfmslznztf ozshtidmigmsbh . com  ^ 

217.23.3.242 

5.4 

vdlhxlmqhf af eovqohwrbaskrh . com 

Search  request 

5.4 

nmf vaof nginwocnidecxnpcs . com 

5.3 

euuqddlxgrnxlr j  jbhytukpz . com  ^ 

188.40.114.195 

2.1 

qvhobsbzhzhdhenvzbs . com 

Click  confirmation 

188.40.114.228* 

2.1 

mbbcmyjwgypdcujuuvrlt . com 

Click  confirmation 

2.0 

wuyigrpdappakoahb9 . com 

217.23.9.247 

6.1 

vzsjfnjwchfqrvylhdhxa. com 

Flash  player  identification 

5.8 

vjlvchretllif csgynuq. com 

83.133.124.191  * 

5.6 

chvhcncpqttf pcibtmetg . com 

Flash  player  identification 

178.239.55.170 

1.2 

jgvkfxhkhbbjoxggsve . com 

Unknown  /  JavaScript  injection 

83.133.120.16 

1.2 

xlotxdxtorwfmvuzf uvtspel . com 

Unknown  /  JavaScript  injection 

1.1 

mkvrpknidkurcrftiqsf jqdxbn . com 

83.133.124.191 

— 

ezcfogjitbqwnornezx. com 

Fallback 

— 

rwdtklvrqnf f dqkyuugf klip . com 

— 

uinrpbrfrnqggtorjdpqg. com 

188.40.114.228 

— 

jzlevndwetzyfryruytkzkb . com 

Fallback 

— 

glzhbnbxqt j  oasaeyf twdmhz j  d . com 

— 

kttvkzpwufmrditdo j Igytxyb . com 

46.249.59.47 

— 

loanxohaktcocrovagkaa. com 

Fallback 

— 

mxyawkwuwxdhuaidissclggy . com 

— 

erspiwscuqslhj  f Igbbgcf be . com 

46.249.59.48 

— 

spujplpdupiwbghiedhqeja. com 

Fallback 

— 

xttf dqrsvlkvmtewgiqolttqi . com 

217.23.9.140 

— 

dxgplrlsljdjhqzqajkcau. com 

Fallback 

Table  5;  Pseudo-domains  and  IP  addresses  extracted  from  the  search-hijacking  module  via  malware  executions  and 
reverse  engineering.  The  v=  column  shows  the  value  of  the  v  argument  used  in  requests.  The  Purpose  column  lists 
the  class  of  commands  sent  to  the  SH-C&C  server.  Communication  attempts  to  (and  DNS  requests  for)  Fallback 
IP  addresses  occur  when  the  malware  is  unable  to  establish  communication  with  a  pseudo-domain  selected  for  another 
function.  When  this  occurs,  the  original  SH-C&C  message  is  sent  to  the  fallback  IP  address.  In  this  case,  a  pseudo¬ 
domain  corresponding  to  the  fallback  address  does  not  appear  in  the  HOST  field;  instead,  the  original  pseudo-domain 
appears.  Pseudo-domains  labeled  with  ^  were  discovered  via  reverse  engineering,  but  not  verified  in  observations  of 
network  requests,  presumably  due  to  limited  executions.  IP  addresses  labeled  with  *  reflect  addresses  that  were  unex¬ 
pected  given  the  associated  pseudo-domains.  This  anomalous  behavior  occurred  in  a  very  small  number  of  executions, 
perhaps  due  to  some  kind  of  bug,  or  related  to  the  fallback  domains.  We  inferred  the  IP  addresses  associated  with 
predicted  but  not  observed  pseudo-domains  via  the  de-obfuscation  algorithm. 
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&br=iexplore&s=*  •  • 


v=5 . 4&id= • • • &aid= • • • &sid= • • • &q=bike&eng=www. google . com&os=  *  * • 


User's  search  query 


o 


Base64  encode 


kr8KeUk34kWad  dj 0lLjQmaWQ9WFh. . . GxvcmUmczlYWFgK 

1 3  random  Base64  chars  Base64-encoded  parameter  string 


dhelJtf zYP 

10  random  Base64  chars 


o 


Concatenate 


HTTP  GET  /kr8KeUk34kWaddj 01LjQmaWQ9WFh. . . GxvcmUmczlYWFgKdhel Jtf zYP 


Figure  4:  Encoding  of  a  Zero  Access  search-hijacking  module  search  request.  When  the  user  issues  a  search  query,  the 
module  requests  a  list  of  URLs  to  which  the  user  should  be  redirected.  The  user’s  query  and  other  module  parameters 
are  combined  using  the  standard  URL  parameter  encoding  scheme.  The  resulting  string  is  then  Base64-encoded  and 
padded  by  prepending  13  and  appending  10  apparently  random  Base64  encoding  characters.  The  resulting  string  is 
then  used  to  form  the  HTTP  GET  request  to  the  SH-C&C  server.  (Values  denoted  “•  •  •  ”  have  been  truncated  for  space 
in  this  example.) 


IP  Address 

Location 

217.23.3.223 

Netherlands 

83.133.127.85 

Germany 

Table  6;  IP  addresses  of  the  first  hop  servers  in  the  ad  click  redirection  chain  for  the  search-hijacking  module. 


6.2.2  Primary  Rendezvous 

Hardcoded  into  each  version  of  the  Zero  Access  search-hijacking  module  is  a  list  of  .com  domains  of  the 
form  shown  in  Table  5.  However,  these  domains  are  not  resolved  in  the  usual  way  using  the  Domain  Name 
System.  Instead,  each  domain  encodes  an  IP  address  directly.  To  make  the  distinction  clear,  we  call  these 
pseudo-Aomum  names.  To  obtain  the  IP  address  of  a  command-and-control  server,  the  module  decodes 
one  of  the  pseudo-domain  names  to  an  IP  address  using  the  algorithm  given  in  Appendix  A.2,  which  we 
extracted  from  the  module  binary.  In  addition  to  decoding  to  an  IP  address,  the  domain  name  may  have  been 
used  for  authentication  as  described  in  the  next  section. 

Table  5  lists  all  pseudo-domain  names  and  their  associated  IP  addresses  and  domain  names  associated 
with  the  search-hijacking  module  that  we  observed.  In  addition.  Table  6  lists  the  IP  addresses  of  the  first 
hop  servers  in  the  search-hijacking  ad  click  redirection  chain. 

6.2.3  Authentication 

Normally,  an  HTTP  interaction  progresses  as  follows:  (1)  the  browser  resolves  the  domain  name  in  the  URL 
(e.g.,  WWW. google .  com)  to  an  IP  address;  (2)  the  browser  connects  to  the  Web  server  at  the  given  address; 
(3)  in  its  request,  the  browser  sends  a  Host  header  specifying  the  domain  name.  Instead,  the  hot  client  skips 
the  first  step  and  from  the  pseudo-domain  directly  extracts  the  associated  IP  address  encoded  in  the  name. 
It  then  connects  to  the  Web  server  and  still  includes  a  Host  header  with  the  pseudo-domain,  even  though  it 
never  resolved  that  name,  and  in  fact  could  not  since  the  name  is  unregistered  in  some  cases. 

During  our  early  exploration  of  the  botnet,  we  observed  that  manipulating  or  removing  the  Host  header 
resulted  in  the  SH-C&C  protocol  responding  to  messages  with  errors.  After  subsequent  updates  to  the 
SH-C&C  protocol,  however,  we  were  unable  to  reproduce  this  behavior. 

This  behavior,  when  active,  could  reflect  usage  of  the  domain  name  as  way  to  authenticate  legitimate 
hot  clients  to  the  SH-C&C  server.  No  normal  Web  browser  can  reach  the  server  via  the  domain  name  since 
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it  is  not  registered;  and  presumably  no  seanner  trying  to  find  Web  servers  will  know  whieh  domain  name  to 
inelude  in  the  Host  header  to  look  like  a  hot  elient. 

6.2.4  Encryption 

In  response  to  a  SH-C&C  message,  the  server  sends  baek  an  HTTP  oetet-stream  of  enerypted  eiphertext. 
Appendix  A.  1  eontains  details  on  the  deeryption  algorithm.  (We  obtained  the  algorithm  via  statie  reverse 
engineering.)  The  response  to  seareh  result  C&C  requests,  onee  deerypted,  provides  a  list  of  0-or-more 
replaeement  ad  URLs.  These  URLs  are  used  in  Step  ©  of  the  hijaeking  proeess. 

The  target  for  eaeh  ad  URL  is  a  first  hop  ad  server  (Table  6),  whieh  when  visited  will  begin  a  302-redireet 
ehain.  Along  with  this  ad  eliek  URL  is  another  URL  to  be  used  as  a  forged  Referer  field  in  fhe  subsequenf 
ad  fefeh. 

6.2.5  Rate  Limiting 

During  our  inferaefion  with  the  ZeroAeeess  SH-C&C  we  observed  advertisement  eliek  rate  limiting.  When 
we  performed  frequent  searehes  from  a  partieular  IP  address,  the  SH-C&C  initially  returned  a  large  (more 
than  5)  number  of  ads  per  query.  The  more  we  interaeted  with  the  advertisements,  though,  the  fewer  ads  were 
returned  from  subsequent  servers.  Speeifieally,  we  observed  rate  limiting  aeross  the  following  dimensions 
independently:  souree  IP  address,  seareh  term,  and  affiliate  ID. 

Rate  limiting  based  on  IP  address  or  affiliate  ID  may  imply  the  botnet  attempting  to  limit  the  amount  of 
fraud  performed  by  a  partieular  entity,  in  order  to  avoid  deteetion.  Rate  limiting  based  on  seareh  term  may 
refleet  a  limited  supply  of  relevant  ads  for  a  partieular  term. 

6.2.6  Secondary  Rendezvous 

Prior  to  establishing  a  eonneetion  to  an  IP  addresses  derived  from  an  obfuseated  SH-C&C  domain,  the 
ZeroAeeess  malware  performs  a  DNS  request  (an  A-Record)  for  the  domain.  This  DNS  request  is  generated 
by  the  ZeroAeeess  malware,  and  does  not  use  any  of  the  traditional  Windows  API’s  for  resolving  domains. 
The  request  always  has  DNS  transaetion  ID  0x3333  and  is  always  sent  to  Google’s  DNS  server  at  8 . 8 . 8 . 8. 

Through  both  statie  reverse  engineering  and  live  malware  exeeutions,  we  have  been  unable  to  aseer- 
tain  the  purpose  for  this  DNS  aetivity.  When  a  new  version  of  the  seareh-hijaeking  module  is  released, 
the  domains  assoeiated  with  that  module  are  generally  not  registered.  Throughout  the  lifetime  of  the  mod¬ 
ule  various  domains  will  beeome  registered,  sometimes  by  seeurity  researehers.  However,  the  behavior  of 
the  ZeroAeeess  malware  does  not  appear  to  be  affeeted  by  the  response  to  these  DNS  requests.  The  mal¬ 
ware  never  uses  the  IP  addresses  returned  by  these  queries,  and  its  exeeution  eontinues  independent  of  the 
resolution  status  of  the  domain. 

Other  malware  families  have  used  similar  funetionality  as  a  seeondary  rendezvous  utilized  to  regain 
eontrol  of  the  botnet  in  the  event  of  a  takedown  [10].  Although  we  do  not  believe  the  eurrent  ZeroAeeess 
versions  have  sueh  behavior,  we  are  unable  to  definitively  rule  out  sueh  behavior. 

6.2.7  Additional  Functionality 

In  additional  to  the  primary  SH-C&C  message  that  sends  seareh  terms  and  reeeives  replaeement  ad  URLs, 
there  are  three  other  distinet  types  of  SH-C&C  eommunieation.  The  four  SH-C&C  messages  have  the 
same  behavior  with  respeet  to  how  messages  are  formatted,  obfuseated,  and  enerypted.  These  messages 
eorrespond  with  the  Purpose  eategories  in  Table  5.  Eaeh  of  the  four  message  types  use  the  same  primary 
rendezvous  teehnique,  although  eaeh  pull  from  a  distinet  set  of  domain  names. 

The  three  additional  SH-C&C  messages  have  the  following  syntax: 
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v=l .  2&id=7oU&aid=7oU&sid=7oU&os=%s 

v=6 .  l&id=7o08x&aid=7oU&sid=7oU&os=7oS&f  p=7oS&ad=7oU 

v=2 .  l&id=7o08x&aid=7oU&sid=7oU&kw=7oS&url=7oS&ref  =7oS&os=7oS 

Click  confirmation:  After  a  user’s  eliek  is  hijaeked,  the  malware  sends  a  message  of  type  v=2 . 1  to  a  SH- 
C&C  server,  reporting  the  eliek  URL  as  the  URL  parameter.  In  response  to  this  message  the  SH-C&C  server 
may  direet  the  malware  to  perform  additional  elieks. 

Flash  player  identification:  Messages  of  type  v=6 . 1  report  the  user’s  Flash  Player  version  to  SH-C&C 
servers.  The  version  is  relayed  via  the  f  p  parameter. 

Unknown  /  JavaScript  injection:  The  intended  purpose  of  type  v=l .  2  messages  is  unknown.  In  praetiee 
these  messages  oeeur  far  less  frequently  than  the  other  types  of  eommunieation.  In  response  to  this  message 
from  the  malware,  the  SH-C&C  will  oeeasionally  respond  with  ad  network  JavaSeript,  whieh  we  suspeet  is 
then  injeeted  into  webpages  viewed  by  the  user. 

6.3  Module  History 

Between  May  2013  and  November  2013  we  have  observed  two  distinet  versions  of  the  seareh-hijaeking 
module  (independent  of  ehanges  to  the  ineluded  pseudo-domains).  Initially  (May  and  part  of  June),  the  v 
parameter  observed  in  seareh  request  messages  was  5 . 3.  During  this  time  the  response  to  seareh  queries 
was  a  list  of  plaintext  advertisements.  In  June,  the  value  of  the  v  parameter  ehanged  to  5 . 4,  and  the  response 
to  seareh  queries  took  on  the  enerypted  form  deseribed  above. 

The  V  identifier  for  the  other  three  eategories  of  eommands  has  also  ehanged  over  time,  as  shown  in 
Table  5. 

We  have  also  examined  samples  that  had  different  pseudo-domain  names  hardeoded  into  them.  Despite 
ineluding  different  pseudo-domain  names,  the  names  generally  deeode  to  the  same  set  of  IP  addresses. 

6.4  Advertising  Networks 

Over  time,  from  our  observations  and  others  [22],  this  module  has  been  seen  to  defraud  a  large  number 
of  ad  networks,  ineluding  Vseareh,  Affinity,  Domain  Development  Corporation  and  Hoist  Media.  Some  of 
these  ad  networks  overlap  with  those  seen  defrauded  by  the  eliek  fraud  module,  but  some  we  have  only  seen 
defrauded  by  the  seareh-hijaeking  module. 

7  Conclusion 

In  this  report  we  have  deseribed  two  ZeroAeeess  modules,  the  auto-clicking  module  that  performs  tradi¬ 
tional  eliek  fraud  by  simulating  user  elieks  on  advertisements,  and  a  more  reeent  search-hijacking  module, 
whieh  intereedes  upon  user  elieks  on  Web  seareh  results,  instead  sending  the  user  to  an  advertisement  re¬ 
lated  to  the  seareh.  In  both  eases,  the  botmaster  stands  to  earn  a  commission  from  the  click.  We  documented 
technical  specifics  for  bofh  forms  in  defail,  including  key  infraslruclure  componenfs  (domain  names  and 
IP  addresses  corresponding  fo  C&C  servers)  we  have  discovered  via  reverse-engineering  of  fhe  modules  and 
by  observation  of  fhe  malware’s  live  execufion. 
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A  ZeroAccess  Algorithms 

A.l  ZeroAccess  Search- Hijacking  Ad  List  Decryption  Algorithm 

The  SH-C&C  server  encrypts  its  responses  to  requests  by  the  search-hijacking  module  for  replacement  ad 
URLs.  The  following  code  listing  gives  the  decryption  algorithm  for  this  response.  We  extracted  the  code 
from  the  binary  module  and  decompiled  it  using  the  IDA  Pro  Hexrays  decompiler. 

1  //  Usage: 

2  //  char  cipher_text  [4096] ; 

3  //  uint32_t  cipher_len  =  fread(cipher_text ,  ..  ); 

4  //  char  cipher_len_str [256] ; 

5  //  uint32_t  strlen_cipher_len_str  =  sprintf  (cipher_len_str ,  "°/.u"  ,  cipher_len)  ; 

6  //  decrypt  ( 

7  //  cipher_len_str , 

8  //  cipher_text, 

9  //  cipher_len, 

10  //  strlen_cipher_len_str 

11  //  ); 

12 

13  int  decrypt  ( 

14  char*  cipher_len_str , 

15  char*  cipher_text , 

16  uint32_t  cipher_len, 

17  uint32_t  strlen_cipher_len_str) 

18  { 

19  char*  v4; 

20  uint32_t  v5; 

21  int32_t  v6,  v7,  result; 

22  char  v9 ; 

23  int32_t  vlO; 

24  char  vll; 

25  int32_t  vl2,  vl3,  vl4; 

26  char  vl5; 

27  int32_t  vl6; 

28  char  vl7 ; 

29  int32_t  vl8; 

30  uint8_t  vl9; 

31  int32_t  v20; 

32  uint32_t  v21; 

33  uint8_t  v22; 

34  char  v23 ; 

35  char  *v24; 

36  char  v25,  v26; 

37  int32_t  v27 ; 

38  char  arr [256] ; 

39 

40  v4  =  cipher_len_str ; 

41  v26  =  0; 

42  v5  =  0; 

43  do  { 

44  *(arr  +  v5)  =  (char)v5; 

45  ++v5 ; 

46  }  while  (v5  <  256) ; 

47 

48  v6  =  0; 

49  v7  =  0; 

50  v27  =  0; 
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result  =  0; 
while  (1)  •[ 

v9  =  *(arr  +  v6) ; 

if  (result  >=  strlen_cipher_len_str) 
result  =  0; 

vlO  =  (uint8_t) (v7  +  v9  +  *(char  *) (result  +  v4) 

*(arr  +  v6)  =  *(arr  +  vlO) ; 

*(arr  +  vlO)  =  v9; 

vll  =  *(arr  +1  +  v27) ; 
vl2  =  result  +  1; 


if  (vl2  >=  strlen_cipher_len_str) 
vl2  =  0; 

vl3  =  (uint8_t) (vlO  +  vll  +  *(char  *) (vl2  +  v4)) 

*(arr  +  1  +  v27)  =  *(arr  +  vl3) ; 
vl4  =  vl2  +  1; 

*(arr  +  vl3)  =  vll; 

vl5  =  *(arr+2  +  v27) ; 
if  (vl4  >=  strlen_cipher_len_str) 
vl4  =  0; 

vl6  =  (uint8_t) (vl3  +  vl5  +  *(char  *) (vl4  +  v4)) 
*(arr+2  +  v27)  =  *(arr  +  vl6) ; 

*(arr  +  vl6)  =  vl5; 

vl7  =  arr[v27+3]; 
vl8  =  vl4  +  1; 

if  (vl8  >=  strlen_cipher_len_str) 
vl8  =  0; 

vl9  =  vl6  +  vl7  +  *(char  *) (vl8  +  v4) ; 
v20  =  v27; 
v7  =  vl9; 

arr[v27+3]  =  *(arr  +  v7)  ; 

*(arr  +  v7)  =  vl7; 
result  =  vl8  +  1; 
v27  +=  4; 

if  ((unsigned  int)(v20  +  4)  >=  0x100) 
break; 
v6  =  v27; 


v21  =  0; 

if  (cipher_len)  { 
v22  =  0; 
do  { 

++v22 ; 

v23  =  *(arr  +  v22) ; 

v24  =  arr  +  (uint8_t) (v23  +  v26) ; 

v26  +=  v23; 
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110  v25  =  *v24; 

111  *(arr  +  v22)  =  *v24; 

112  =kv24  =  v23; 

113  result  =  (uint8_t) (v25  +  v23) ; 

114  *(char  *) (v21++  +  cipher_text)  ~=  *(arr  +  result); 

115  }■  while  (v21  <  cipher_len)  ; 

116  } 

117 

118  return  result; 

119  } 

A.2  ZeroAccess  Search-Hijacking  Pseudo-domain  to  IP  Algorithm 

The  following  Python  fragment  gives  an  algorithm  for  deeoding  IP  addresses  in  the  seareh-hijaeking 
module  represented  as  (pseudo-)domain  names.  We  developed  the  eode  based  on  deeompiling  the  binary. 


from  binascii  import  crc32 
from  struct  import  pack 
from  socket  import  inet_ntoa 

def  deobfuscate_domain(d) : 

bO  =  crc32(d[0:5] ,0x7E873D53)  &  OxFF 
bl  =  crc32(d[5:9] ,0x570848EB)  &  OxFF 
b2  =  crc32(d[9:12] ,0x768772F3)  &  OxFF 
b3  =  crc32(d[12:17] ,0x4775114F)  &  OxFF 

ip_as_int  =  bO  +  (bl  «  8)  \ 

+  (b2  «  16)  \ 

+  (b3  «  24) 

packed_ip  =  pack(’<I’,  ip_as_int) 
return  inet_ntoa(packed_ip) 
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